Run debug CI nightly instead of on every PR#9
Merged
Conversation
Backport of duckdb#1121 — debug CI runs are slow (>2h), so move to a nightly schedule plus manual dispatch instead of running on every push/PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bill-ph
added a commit
that referenced
this pull request
May 8, 2026
Speed up CI via two independent levers: 1. MainDistributionPipeline.yml: route the linux_amd64 / linux_arm64 build matrix entries to 16-core Depot managed runners (depot-ubuntu-24.04-16 / depot-ubuntu-24.04-arm-16) via the upstream workflow's runners JSON input. Cuts the build from ~25-30min to ~5-10min, including cold-cache cases. Requires PostHog/ducklake to be added to the Depot GitHub App installation; until then the build jobs will queue-time-out at 10m. 2. Five per-PR test workflows (Catalogs, ConfigTests, DeletionVectors, MinIO, NoInline): bump hendrikmuhs/ccache-action max-size from the 500MB default to 1.5G. DuckDB's full release build produces 1-2 GB of object files, so the default forces aggressive LRU eviction and keeps effective hit rate at 30-50%. 1.5G fits most of the build, pushing warm-cache hit rates into the 80-90% range. 5 caches x 1.5G = 7.5 GB, leaving headroom in the 10 GB GitHub Actions cache budget per repo. Cold-cache cases (DuckDB submodule bumps) will still take ~25-30min but are infrequent. Debug.yml is now nightly-only after #9 and is left alone. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
bill-ph
added a commit
that referenced
this pull request
May 9, 2026
Speed up CI via two independent levers: 1. MainDistributionPipeline.yml: route the linux_amd64 / linux_arm64 build matrix entries to 16-core Depot managed runners (depot-ubuntu-24.04-16 / depot-ubuntu-24.04-arm-16) via the upstream workflow's runners JSON input. Cuts the build from ~25-30min to ~5-10min, including cold-cache cases. Requires PostHog/ducklake to be added to the Depot GitHub App installation; until then the build jobs will queue-time-out at 10m. 2. Five per-PR test workflows (Catalogs, ConfigTests, DeletionVectors, MinIO, NoInline): bump hendrikmuhs/ccache-action max-size from the 500MB default to 1.5G. DuckDB's full release build produces 1-2 GB of object files, so the default forces aggressive LRU eviction and keeps effective hit rate at 30-50%. 1.5G fits most of the build, pushing warm-cache hit rates into the 80-90% range. 5 caches x 1.5G = 7.5 GB, leaving headroom in the 10 GB GitHub Actions cache budget per repo. Cold-cache cases (DuckDB submodule bumps) will still take ~25-30min but are infrequent. Debug.yml is now nightly-only after #9 and is left alone. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
fuziontech
added a commit
that referenced
this pull request
May 27, 2026
* Publish DuckLake release binaries on tags (#7) * ci: bigger runner for build, bigger ccache for tests (#8) Speed up CI via two independent levers: 1. MainDistributionPipeline.yml: route the linux_amd64 / linux_arm64 build matrix entries to 16-core Depot managed runners (depot-ubuntu-24.04-16 / depot-ubuntu-24.04-arm-16) via the upstream workflow's runners JSON input. Cuts the build from ~25-30min to ~5-10min, including cold-cache cases. Requires PostHog/ducklake to be added to the Depot GitHub App installation; until then the build jobs will queue-time-out at 10m. 2. Five per-PR test workflows (Catalogs, ConfigTests, DeletionVectors, MinIO, NoInline): bump hendrikmuhs/ccache-action max-size from the 500MB default to 1.5G. DuckDB's full release build produces 1-2 GB of object files, so the default forces aggressive LRU eviction and keeps effective hit rate at 30-50%. 1.5G fits most of the build, pushing warm-cache hit rates into the 80-90% range. 5 caches x 1.5G = 7.5 GB, leaving headroom in the 10 GB GitHub Actions cache budget per repo. Cold-cache cases (DuckDB submodule bumps) will still take ~25-30min but are infrequent. Debug.yml is now nightly-only after #9 and is left alone. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> * [codex] Route Postgres DuckLake metadata through passthrough (#5) * Route Postgres metadata reads through passthrough * Fix Postgres metadata passthrough regressions * Keep Postgres thread cache setting test-scoped * Fix Postgres test pool setting init order * Update postgres scanner for pool defaults * Route Postgres global stats through passthrough (#11) * Remove DuckDB extension deploy job * Pin inlined table registry to snapshot schema * Fix duplicate Query(string&) after v1.5.3 rebase Rebasing the Postgres metadata passthrough patch (#5) onto DuckDB v1.5.3 replayed an old DuckLakeMetadataManager::Query(string&) body (returning transaction.Query(query)) as a second definition alongside upstream's new implementation, which uses SubstituteCatalogPlaceholders + transaction.ExecuteRaw. Drop the stale duplicate; upstream's version is correct for the v1.5.3 metadata-manager architecture. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: pin GitHub Actions to commit SHAs The v1.5.3 workflows inherited from upstream use tag/branch refs (actions/checkout@v4, ccache-action@main, etc.), which the PostHog org ruleset rejects ("all actions must be pinned to a full-length commit SHA"). Pin every action to its commit SHA, reusing the SHAs vetted in #17 on main where applicable. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix infinite recursion in Postgres metadata passthrough PostgresMetadataManager::ExecuteQuery builds a `CALL postgres_query(...)` passthrough statement and ran it via transaction.Query(). In DuckDB v1.5.3's metadata architecture transaction.Query() delegates back to metadata_manager->Query(), i.e. PostgresMetadataManager::Query(), which calls ExecuteQuery() again and wraps the statement in yet another CALL postgres_query(...) — recursing without bound and hanging the "Test Postgres" suite (SQLite, which has no passthrough, was unaffected). Run the fully-formed passthrough call via transaction.ExecuteRaw() so it goes straight to the metadata connection without re-entering the manager. This is the load-bearing change that #19 made (as RawQuery) and that was lost when #19 was skipped during the v1.5.3 rebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Re-derive metadata query-intent split (#19) on v1.5.3 architecture The Postgres metadata passthrough (#5/#11) depends on a read/write/raw routing split that #19 introduced and that was dropped during the v1.5.3 rebase. v1.5.3 had collapsed all metadata access onto transaction.Query, which for a Postgres backend wrongly routed everything through the postgres_query passthrough — breaking DDL (uuid()), the metadata-exists probe, local utility queries (ATTACH, duckdb_secrets), and recursing on the passthrough CALL itself. Restore the split, adapted to v1.5.3's manager-side substitution and ExecuteRaw executor: - Transaction-level Execute/SnapshotQuery/CurrentQuery/RawQuery run RAW on the metadata connection (DuckDB executes against the attached Postgres catalog, handling uuid()/LIST/STRUCT and queries that embed postgres_query CTEs). - Metadata-manager-level Execute/SnapshotQuery/CurrentQuery do the postgres_execute/postgres_query passthrough, used (unqualified) only by the snapshot-pinned file/stats reads that benefit from server-side execution (#5/#11). Simple committed reads (e.g. global stats, inlined file deletions) also use the manager passthrough. - Call sites relabeled by intent: writes -> Execute, snapshot reads -> SnapshotQuery, current-state reads -> CurrentQuery, local utilities -> RawQuery. Notably ReadInlinedFileDeletions and other inlined-deletion reads were running through Execute (postgres_execute returns no rows), which double-applied inlined deletions. - Port the check_metadata_query_intent.py guard and wire it into CI. Verified locally against DuckDB, SQLite and Postgres backends (postgres_query/postgres_execute via Docker): all three suites green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Materialize embedded postgres_query reads (v1.5.3 streaming limit) DuckDB v1.5.3 rejects multiple streaming scans, or a streaming scan alongside an insert/CTAS, in one query. The Postgres metadata reads that embed postgres_query() run raw, so an un-materialized postgres_query streams: - GetLatestSnapshotQuery() is evaluated during write statements, so its streaming postgres_query scan collided with the insert ("Failed to query most recent snapshot ... Multiple streaming scans"). Wrap it in a MATERIALIZED CTE. - The file-column-stats CTE used NOT MATERIALIZED for single-reference columns; multiple such columns produced multiple streaming postgres_query scans. Always mark it MATERIALIZED. Negligible cost (small metadata results). Only surfaces with a streaming postgres_scanner build (CI); a materializing local build masked it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Route GetSnapshot through CALL passthrough, not streaming postgres_query The MATERIALIZED-CTE attempt didn't help: an embedded "SELECT * FROM postgres_query(...)" is a streaming scan, and GetSnapshot() runs during write statements, so it tripped DuckDB v1.5.3's "streaming scan + insert in the same query" restriction. Wrapping it in a MATERIALIZED CTE just turned it into "streaming scan + CTAS". Use the metadata-manager passthrough instead, which executes CALL postgres_query(...) as a separate statement returning a materialized result (the same path GetSnapshot(at_clause) already uses and that global stats use successfully in CI): - GetSnapshot(): call the manager-level CurrentQuery() (passthrough) instead of transaction.CurrentQuery() (raw embedded scan). - Drop the Postgres GetLatestSnapshotQuery() override so the plain base SELECT is used; the CALL passthrough wraps it. - Revert the file-column-stats CTE back to its original MATERIALIZED hint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: build via PostHog/extension-ci-tools fork (pinned actions) The PostHog org ruleset rejects any unpinned action ref, enforced transitively into reusable workflows. duckdb/extension-ci-tools uses unpinned internal actions (actions/checkout@v4, ...), so every run startup-failed at "Prepare all required actions". Point the distribution call at the PostHog extension-ci-tools fork, whose main (d3e0ee09) pins every internal action to a SHA and exposes the `runners` input, and pull ci-tools makefiles from the same fork/ref via override_ci_tools_repository. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: build linux_arm64 on GitHub-hosted runner The depot arm64 runner was unresponsive (job lost at the 10-minute runner timeout). This repo is public, so use the free GitHub-hosted ubuntu-24.04-arm runner for arm64; keep amd64 on the depot runner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Pin metadata postgres_scanner to single-threaded (materialize scans) DuckLake metadata reads (e.g. GetCatalogForSnapshot) scan several attached Postgres tables in one statement using DuckDB LIST/STRUCT syntax, so they must run on the metadata connection rather than server-side. DuckDB v1.5.3 rejects multiple streaming postgres_scanner scans (and streaming + insert) in a single query, which these multi-table reads triggered. postgres_scanner only streams a scan when max_threads > 1, where max_threads = pages / pg_pages_per_task. After attaching the Postgres metadata catalog, set pg_pages_per_task to a very large value so max_threads is pinned to 1 and these reads materialize instead of streaming. Metadata results are tiny, so there is no throughput cost. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Align postgres_scanner to upstream v1.5.3 pin (c0e9256) #5 bumped postgres_scanner forward to dd71d196 (the "Connection pool" rework, which adds the database-connector submodule) on the DuckDB 1.5.2 line. That newer scanner is incompatible with DuckDB v1.5.3's restriction on streaming postgres scans: DuckLake's multi-table metadata reads (GetCatalogForSnapshot, ...) failed with "Multiple streaming scans or streaming scans + CTAS / insert in the same query are not currently supported". Pin postgres_scanner to c0e9256 -- the exact version upstream DuckLake's v1.5-variegata (DuckDB v1.5.3) is tested against -- and drop the ineffective pg_pages_per_task workaround. Revisit the connection-pool scanner bump separately against a v1.5.3-compatible scanner if needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Pin postgres_scanner to the DuckDB v1.5.3 release version (6b2b12c) Root cause of the remaining Postgres CI failures: CI built an older postgres_scanner that streams multi-table scans, which DuckDB v1.5.3 rejects ("Multiple streaming scans or streaming scans + CTAS / insert in the same query are not currently supported") for DuckLake's metadata reads (GetCatalogForSnapshot scans 4 metadata tables in one query). 6b2b12c is the postgres_scanner the DuckDB v1.5.3 extension release ships; it materializes those scans instead of streaming. It is 29 commits ahead of #5's dd71d196 (so it keeps the connection-pool work) and 79 ahead of c0e9256. Verified: the full Postgres test suite passes locally against this scanner (428 test cases, 0 failures), alongside the green DuckDB and SQLite suites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: test Postgres against the prebuilt scanner, not a source build Root cause of the persistent Postgres CI failures: the test job set ENABLE_POSTGRES_SCANNER=ON and statically built postgres_scanner from source. That source build streams multi-table postgres scans, which DuckDB v1.5.3 rejects ("Multiple streaming scans or streaming scans + CTAS / insert in the same query are not currently supported") for DuckLake's metadata reads (GetCatalogForSnapshot scans 4 tables at once). The officially released postgres_scanner materializes those scans. Stop building postgres_scanner from source for the test job; install the released extension (INSTALL postgres FROM core) instead, exactly as a real deployment would load it. The full Postgres suite passes locally with the prebuilt extension (428 test cases, 0 failures). SQLite/Quack still build from source as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(quack): route catalog-load reads through the quack passthrough GetCatalogForSnapshot runs six snapshot-pinned reads via the raw transaction-level SnapshotQuery, which scans the attached metadata catalog directly. For the quack backend that produces multiple streaming quack_query scans in one statement, and quack's optimizer rejects that ("Multiple streaming scans ... not currently supported"), failing 350/394 tests in the Test Quack step. Add a SnapshotCatalogQuery seam on the metadata manager. The base implementation keeps the existing raw behavior (correct for the DuckDB backend, and for Postgres where postgres_scanner materializes the multi-table scan). QuackMetadataManager overrides it to route through CALL quack_query_by_name, which executes the DuckDB SQL server-side in a real DuckDB and returns a materialized result. Verified locally on the DuckDB, Postgres, and SQLite backends; quack is verified via CI (cannot build the quack variant locally). Intent guard still passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(quack): route all multi-table metadata reads through the passthrough The first CI run after the GetCatalogForSnapshot fix moved the failure forward: passes went 44 -> 113, the table-information error disappeared, and the next raw multi-table reads surfaced the same quack streaming-scan error (500 data-file-list, 220 extended-file-list, 16 column lookup, 16 orphan-cleanup). Generalize the seam introduced for GetCatalogForSnapshot: add a CurrentCatalogQuery counterpart to SnapshotCatalogQuery (both default to the raw transaction read; quack overrides both to its quack_query_by_name passthrough) and route the remaining raw multi-table reads through them: GetFilesForTable, the extended file list, IsColumnCreatedWithTable, GetOrphanFilesForCleanup, and GetTableSizes. Single-table raw reads are left as-is: a single streaming quack scan is allowed by quack's optimizer. The default keeps DuckDB/Postgres/SQLite on the raw path (verified: stats, compaction, alter, remove_orphans, add_files pass on all three backends). Intent guard still passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(quack): route metadata cleanup through passthrough * ci: clarify postgres_scanner.cmake is the local/source-build path only CI no longer builds postgres_scanner from source (the Postgres test job installs the released extension via `INSTALL postgres FROM core`), so the GIT_TAG pin here is only exercised by local/manual builds that set ENABLE_POSTGRES_SCANNER. Document that and why the pin matters, so the file doesn't read as the active CI scanner version. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: skip flaky Quack concurrent catalog creation * test: skip Quack deletion inlining retry flake --------- Co-authored-by: Bill Guowei Yang <bill@posthog.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
v1.5-variegatabranch.Test plan
workflow_dispatchstill works🤖 Generated with Claude Code